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ABSTRACT 

Ways to improve the validity of assessment of colXega 
students are diseussad. Validity problens often have occurred because 
the purpose of assessment was not clear. Traditionally, college 
students have baen tested Cor admission and plaeenent. However, the 
results of these tests have been used for other purposes, such as 
edmparing insti tut ions. Whatever the intended purpose of testing, the 
findings from assessment are often used for something else. College 
students are comonly assessed for accountability, certification, or 
institutional self study. Assessment is an indicator ef quality and 
is also used as an intervention for aducational change. Individual 
instructional assassment (IIA) is proposed as an assessment model 
which incorporatet assassnent with the institution's teaching 
mission, IIA usas assessment to recognize and extend individual 
student accomplishment. To implement the IIA system, coBUiitment from 
administrators and faculty would ba necessary. In addition, the 
measurement comBunity would need to provide assistance, including 
tests, technological support, training, and reeommended procedures. 
(dDC) 



* R#^r0du0tioRS supplied by EDRS are the best that can be made « 

* frra the original desuient* * 



Critical Validity Isiues in the Miithodology of 
Higher Education AMeisment 

rsj 

UJ Ev« L, Baker 

UCUA Cpntw for Sludftnt Testingi Evaluaticin, and Standards 



U 9 BiPARTMiNTOF t^CATtON 

EDUCATIONAL RrSOURCES INFOnMATION 
CENTER (EPia 

OfiQmatffig il 
rujiiftef ^hif^; My* mads ia imtifevs 



•PERMISSION TO REPRODUCE THIS 
MATERIAL MAS BEiN QRANTiO BV 



TO THi EDUCAtiONAl BESOURCES 
INFORMATION CENTER (ERIC) " 



Paper presented at the forty-seventh ETS Invitational Conference, sponsored 
by Educational Testing Service, at The PUia, New York City, on October 25. 
1986. 



ERiC 



Critical Validity Issues 
In the Methodology of 
Higher Education Assessment 



UCLA Ccniet for Sfmhtif lhlins> Evtilw^Uoph nful Slmdanh 



V.ilidily m ihe grand old contt^pi of *i?JsessmfMil. H si.ind^ for ii complex 
sef of id^vifi involving Ihc pufpoBc;^ of ar.sTOmcnt, iho makh ul uifornia^ 
iion abtiiinrd to such pufposus, and fht? prnce^N by which information Ib 
vimim\. Vnlidity in testing, in HnglisK is about ^ruth. This paper 
focu*^cs nn increasing the validity of student assessment in highor 
iHlucaiion. 

Since validity h an apparent good, why do we have a problem with it 
in higher education™or anywhere? Our validily problems occur because 
we frequently are uncicai about the purpose we are serving with our 
asi^essmentB, a siluallon that abo clouds the inferences wc should m^ke 
from our findings. 

Traditipnally, at the postsecondary level we have tested students for 
admission and placement. Admissions testing has drawn public attention 
because of iti centrality in the allocation of equal educational opportunity 
and because the average admission test score has become a shorthand 
description for the educational standards of colleges and universities— 
the purported goodness of the education directly related to the difficulty 
of admission, More recently, average admissions test score has been 
applied in n similar way, to evaluate the precollegiale cducAtional effort. 
Although it has been common at private schools to judge educational 
quality in terms of ^he number of students admitted to the most elite 
poitsecondary institutions, it was only relatively recently that such 
college admission test scores were used to compart state educational 
systems (U.S. Department of Education, 1985), Both uses of admissions 
tests raise obvious problems relating to the validity of inferences^ are we 
talking about the quality of the educational institutions themselves, the 
quality of their clients* or some unknown combinations of the two? 
Furthermore, such quantitative shorthand whets some appetites for other 



Himplififd mt'nsurffi of cducilloncil qunlily. So, of increnHing intcruat (o i\u* 
po^Hi^condavy conununily and iho^e cofnpelh'd io cummcnl about its 
i)((ed\vmL*m, h llie uHlily of j^tudcnt ficlijuvcmt'nt mcMsuri'H fur iijiSt'sslrig 
paHinvcomiary cducalional qutilily. Driving Ihm' interosls in sludent 
asst»Bsmt'nt lire lL»gilimLi{{» pulilic concerns about hl^^lwr tuiuctijon contH 
and benefits. The Fpntc of atienlion to this miw by the iWernl establish- 
ment vvii^ perfeclly prcdictnbie^ .is prccoIlcHiiite educationo! progMrns 
were a^hlfted to Statef^ for miin*igenient. ihe miijorily of the remiiininH 
federal educiitional investment was directed to poslsecond^iry students. 
Accountiibility went to collect'. 

Freg^nt Methods 

Prom .ill report!^. e<irh of the existing systematic nHsesyments of student 
academic performance in collegeB and uni ver^slties hjs developed througli 
lopdown mandate, How high up that top h varied with the pre§ent 
ceiling at the statehou* The intended purposes served by nuch mandated 
student assessment include accountability (reporting to legislatures), 
certification (verifying performance for existing teachers), or institutional 
self-study (McClain and Krueger, J965), Although assessment systems* 
may begin with one ostensible purpose (who goes to what segment of 
higher education), a mutation such as outcome assessment is not hard to 
imagine A major fact about testing is that whatever its original lurpose, 
the findings from assessment are always used for somf^thing else. 

From all appearances, many existing assessments of postsecondary 
students share the methodology and flavor of precollegiate, large-scale 
testing activities. The measures are standardized, They are formulated for 
and administered to the group. They often focus on minimums. They 
have great symbolic value, and their functional value is unknown. To the 
extent that student assessment measures become widespread, ! will 
predict that their original purposes will be ti insformed and that they will 
also drive out other Indicators used to evaluate comprehensively the 
quality of higher education Institutions. Simply look at preeollegiate 
education as relevant history. Mandated large-scale testing occurred 
because the preeollegiate system had no convincing information about its 
quality. No nfomiation was available to refute claims that kids couldn't 
read and write, let alone do fraction s and analyze Shakespeare. 
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Assessmi*nt m m It 



iial Raform 



Anhe he^irt of Ihis diNc^ : u umt a bureniicratic 

tool. BureaucnidCB seen nt progrnm serving at 

kmi two purposes; dm r ^ wm quniity; sccnnd, mid 

increasingly mow impo ■ ion. fn procolloglatu educa^ 

lion, for instance, imp ^:esUng is seen in Itself as a 

major cducatiDnal rcfv .y^iy to measure the effects of 

changes in educational s rtn: a classic quick fix. The rhetor- 

ical benefits of furmal ass('' " 'ticulate standards, focus inHtruc- 

lion, motivate students eut to the fire, etc. The feared co^ts 

of such assepsments inch . ^ : to trivia the Important goals of 
education, increagin^^ thr tin | oia idh% generating i^y^temallc attempt to 
^ get around"' ihe mandaks narrowing the curriculum, and so on. Studies 
cf acutal effect^ of testing reforniH will he released shortly and borne light 
may be shed on the utilily of assessment m a productive instrument of 
educational change. 



Assessmont an a Qiifility Indicator 

The use of student achievement is a legitimate important indicator of 
educationQl quality. If they are to be used as part of a system of higher 
education, student assessment programs must be constantly held to their 
purpose^ to provide an accurate and representative reflection of educa- 
tional quality, Methodology used in student assessment does not meet 
this purpose. In my vi jw, student assessment programs must intrinsically 
relate to reil Instructional programs ir departments and courses. They 
must reflect the diverrity of our offerings and what students learn from 
their coursework and their rollege experience. At present, we have 
relaHvely little evidence to document the effects of our cducatln^^al efforts 
in higher education. I believe we can collect such evidence in a way that 
will avoid the bureaucratic and Irrelevant character of much top down 
assessment. We should try to avoid the use of omnibus asi essment where 
a single instrument Is purported to be a major valid indicatdr of quality. 
The nature of higher education is such that using a single common 
measure to reflect student learning will provide very little valid informa- 
tion about educational quality, Most everything will be missed. We may, 
better still, find a way to use student performance assessment as a 
powerful instrument of improvement* 
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Developing an Approach to 
IndiviclUftl Instructional Assessment 

t he model for Hlucltml assessment in higher ifcluciiliun I propose is one 
Ih. t incorporaies riudent nBseBsmunt ns part of the le*iching mlsBion of the 
institution (Cross, igBOl Its purpose is to contribute to the development 
of educntional quality. Call it individual instructipnal assessment (iia), !IA 
develops from a view that colleges and universities have leachiny renpon- 
sibilities to individual students, The teaching responsibilities for indlvid- 
Uiil students get executed as students relate to one another, to professors, 
to teaching assistiints, and to other institutional resources. The product of 
this individual experience is what we should assess. Hven though teaching 
is somdimes a ma!^s net, its reality orrurs in the inmplex interachon 
among the students and all these resources (Pace, igB^h To acknowledge 
and assess the individual distinct, personalimi nature of this eKperience 
is critical. Hov.'cven such acknowledgement should not be confused with 
niodels of instru'jtion (such as those advanced and tested by Keller (1969) 
and Bloom (1967; ^04^ ha does not presuppose self-paced instruction 
and is independent of instructional strategy. The purpose of iia is to use 
assessment as a way to recognize and extend individual student accom^ 
plishment rather than to homogenize it. Its slogan was promulgated by 
Judah SchwartE iig7$h in other contexts, some years ago: "People cor 
in groups of one;' So do higher education institutions, 

A new approach to student assessment in postsccondary education is 
needed. This approach would use as its centerpiece the specific accom- 
plibhments of students in academic courses and courses of study* instead 
of their performance on specially constructed, mandated measures. So I 
will not discuss today a procedure to develop particular instruments. 
Outcomes of higher education would be documented by providing a 
wide range of exmnplcB of the kind of work accomplished by students at 
various levels and majors. The system would not be uniformly applied to 
all courses, nor would exhaustive reporting be expected. Rather an 
inBiihiHgml portfolio would be created. If numbers are required as they 
almost always are, frequencies of students performing at the illustrated 
level or above would be provided for the academic majors assessed. It is 
bottom^up demonstraUon of quality, clearly superior, I think, to judg^ 
ments made on the basis of transcript analyses or catalog review. 

The characteristics desired of such measures are obvious. The common, 
casually developed tests of knowledge and information in rampant use 
could realistically provide only a piece of the information. New, carefully 
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developed tasks for vH$ay UKamiruitlnn or temi pnpcrn vvoiiltl be prepared, 
Crileria for judging the qUfillly of rfsponsus would also be tirliculiilud. In 
operaHon, these assessments would be iidministered on a schedule nntu- 
rally demanded by course organization, Hccdback to students would be 
provided rapidly and in a way that strengthens the personal nature of tlie 
college experience, 

What those taskn should be and the form of the feedback should be a 
faculty matter, Educational quality, in lerniA of wlial and how well 
students learn the full range of aci. iemic offeringn will Ihun be directly 
affected. As present inBtitutionally-generated student aHNesnment is 
focused on scheduled, quantitative summaries of students' performance, 
ifA is periodic, qualitative, formative, diagnostic, and mformalive, ha 
would abo serve to increase rather than decrease the range of approachets 
used to assess learning. It also has particular strength an a meann to 
provide careful differentiated feedback for student*!. 

Of course, such a position requires a massive effort to train faculty 
members. They need to §ee that the way they assess students communi- 
cates what they view as important tu learn. They need to believe that 
careful, timely, and personalized feedback can transform the college 
cKpericnce for students. They need to see assessment as more than a 
means to grade students or to meet bureaucratic requirements. It must 
contribute to their teaching effectiveness. 

Do faculty care enough to engage in the serious work of developing 
high quality measures of course performance? We know In-y are rela- 
tively unskilled now. Whether some would embrace the use of high 
quality measurement approaches (such as domain-referenced -i^sessment) 
remains to be seen. 

What donditlons are required for such a system to work? 

• Agreement from top management that such an approach would 
directly rather than Indirectly both Impact and reflect higher education 
quality and that it Is worth doing and superior to approaches using 
single measures. 

• Incentives for faculty to take this responsibility seriously. 

• A plan for institutional development, first to find leading academic 
institutions willing to undertake a pilot effort* and, within institutions, 
prestyious academic dLpartments to provide the model for others, 

• Useful approaches, took, and training procedures from the measure- 
ment community. 
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Necessary Coiitributions from the 
Measurcmeril Comniunity 

Colleges and universities, if they were to hike seriously and sy^tcTi^ti'^ 
cally the charge to improvu educatianal quality, need certain aBsistance 
fforn the measurement community. For example, approacheB to the 
measurement of deep understanding of subject matter would need expan- 
sion, In a project In this domain we arc attempting to develop procedures 
for assessing essays and term papers that incorporate nppropriate cogni- 
tive representation of subject matter (Baker & Herman, 1986), reliable, and 
valid scoring of student responses, and procedures that do not demand 
Inordinate time to eviiluate each student's effort (Quellmalz. The 
measurement community ncedn to enpand the options it offers college 
professors to assess subject matter and cognitive understanding. 

Secondly, technolpgical supports to the development of assessments 
are at least on the drawing board (Baker & Linn, tgBsl The search should 
intensify for procedures to use computer technology to represent subiect 
matter knowledge and to develop locally appropriate measures of student 
performance. As part of new ocm Cenlfc for Research on TesUng, we have 
a design project to explore techniques from artificial intelligence to create 
a test developer assistant (Baker, 1986). 

Third help from offices of institutional research and evaluation is 
needed to provide the structure and training required for such an experi- 
ment to work, 



Summary of Potential Effects 

If successful the results of iia should be: 

• to deepen the sense of intellectual engagement of students by requiring 
of them high level, defensible performance, and by providing timely 
individualized feedback, 

• to stimulate faculty reflection on the real teaching mission of colleges 
and universities, 

• to avoid the use of marginally valid measures in the assessment of 
higher education, and 

• to provide appropriate indicators of higher education quality, In the 
form of institutional portfolios. 
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In this way, wc con conlribuh? io the responmblc* fisscssmcnl of our 
higher education instituHons. Wc must recdgnixe Hint our institutionN are 
complex, our sludehls are differcnl nnd that our .isscssment tippro.iches 
need to reflect those complexities. 
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